The
is always positive, because square of any number is always positive. But the
correlation coefficient can be positive or negative, depending on whether the fitted line slopes
upward or downward. If the fitted line slopes downward, make your r value negative.
Why did the program give you
instead of r in the first place? It’s because
is a useful estimate
called the coefficient of determination. It tells you what percent of the total variability in the Y
variable can be explained by the fitted line.
An
value of 1 means that the points lie exactly on the fitted line, with no scatter at all.
An
value of 0 means that your data points are all over the place, with no tendency at all for the X
and Y variables to be associated.
An
value of 0.3 (as in this example) means that 30 percent of the variance in the dependent
variable is explainable by the independent variable in this straight-line model.
Note:Figure 18-4 also lists the Adjusted R-squared at the bottom right. We talk about the adjusted
value in Chapter 17 when we explain multiple regression, so for now, you can just ignore it.
The F statistic
The last line of the sample output in Figure 17-4 presents the F statistic and associated p value (under
F-statistic). These estimates address this question: Is the straight-line model any good at all? In other
words, how much better is the straight-line model, which contains an intercept and a predictor
variable, at predicting the outcome compared to the null model?
The null model is a model that contains only a single parameter representing a constant term
with no predictor variables at all. In this case, the null model would only include the intercept.
Under α = 0.05, if the p value associated with the F statistic is less than 0.05, then adding the predictor
variable to the model makes it statistically significantly better at predicting SBP than the null model.
For this example, the p value of the F statistic is 0.013, which is statistically significant. It means using
weight as a predictor of SBP is statistically significantly better than just guessing that everyone in the
data set has the mean SBP (which is how the null model is compared).
Scientific fortune-telling with the prediction formula
As we describe in Chapter 15, one reason to do regression in biostatistics is to develop a prediction
formula that allows you to make an educated guess about value of a dependent variable if you know the
values of the independent variables. You are essentially developing a predictive model.
Some statistics programs show the actual equation of the best-fitting straight line. If yours
doesn’t, don’t worry. Just substitute the coefficients of the intercept and slope for a and b in the
straight-line equation:
.